Architectural Support for an Efficient Implementation of a Software-Only Directory Cache Coherence Protocol
نویسنده
چکیده
Software-only directory cache coherence protocols emulate directory management by handlers executed on the compute processor in shared-memory multiprocessors. While their potential lies in lower implementation cost and complexity than traditional hardware-only directory protocols, the miss penalty for cache misses induced by application data accesses as well as directory accesses is a critical issue to address. In this paper, we study important support mechanisms for software-only directory protocols in the context of a processor node organization for a cache-coherent NUMA architecture. We find that it is possible to remove or hide software handler latency for local as well as remote read misses by adopting simple hardware support mechanisms. To further reduce the overhead of software handler execution, we study the effects of directory data caching. While this could pollute the caches, our results suggest that this effect is marginal and that software handler execution overhead is drastically reduced by allowing caching. Overall, the softwareonly directory protocol enhanced with the miss handling support mechanisms exhibits a performance that is competitive with hardware-only protocols but with a lower cost and complexity.
منابع مشابه
Evaluation of Design Alternatives for a Directory-Based Cache Coherence Protocol in Shared-Memory Multiprocessors
In shared-memory multiprocessors, caches are attached to the processors in order to reduce the memory access latency. To keep the memory consistent, a cache coherence protocol is needed. A well known approach is to record which caches have copies of a memory block in a directory and only notify the caches having a copy when a processor modifies the block. Such a protocol is called a directory-b...
متن کاملPhase-Priority based Directory Coherence for Multicore Processor
As the number of cores in a single chip increases, a typical implementation of coherence protocol adds significant hardware and complexity overhead. Besides, the performance of CMP system depends on the data access latency, which is highly affected by coherence protocol and on-chip interconnect. In this paper, we propose PPB (PhasePriority Based) cache coherence protocol, an optimization of mod...
متن کاملSo Many States, So Little Time: Verifying Memory Coherence in the Cray X1
This paper investigates a complexity-effective technique for verifying a highly distributed directory-based cache coherence protocol. We develop a novel approach called “witness strings” that combines both formal and informal verification methods to expose design errors within the cache coherence protocol and its Verilog implementation. In this approach a formal execution trace is extracted dur...
متن کاملIssues in Software Cache Coherence
Large scale multiprocessors can provide the computational power needed to solve some of the larger problems of science and engineering today. Shared memory provides an attractive and intuitive programming model that makes good use of programmer time and effort. Shared memory however requires a coherence mechanism to allow caching for performance and to ensure that processors do not use stale da...
متن کاملToward Complexity-Effective Verification: A Case Study of the Cray SV2 Cache Coherence Protocol
Modern large-scale multiprocessors, capable of scaling to hundreds or thousands of processors, have proven to be very difficult to design and verify in a timely manner. In particular, the verification process, i.e., proving that the design is functionally correct, is often the most time-consuming aspect of developing the system. This paper discusses the methodology and early experiences of veri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995